Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - experimental recursive approach to getting blocks values #336

Conversation

richardTowers
Copy link
Contributor

Ignore - proof of concept because Richard is too lazy to work out how to run the tests.

- Landing pages have "blocks" rather than a body, and
  by convention we want anything in a block that has the
  key `content:` to be put into the search index.
  Because blocks can be arbitrarily nested, we use the
  JSONPath $.details.blocks..content.
- However, some blocks (govspeak ones) may be structured
  to contain multiple content types (eg if they were marked
  up as content-type: text/govspeak, publishing-api will
  have automatically created a rendered text/html and will
  present both of these to the search api in the
  message_queue message. This means that this block will
  be matched by the JSONPath 3 times - once for the array
  that contains the different content items (since the
  key for the whole thing is `content:`, and once for each
  actual content item inside that array (since those keys
  are also `content:`.
- In order to prevent these keys appearing 3 times in the
  search index (one processed normally by BodyContent's
  matcher for content_type: "text/html", then once again
  for that content item's content: key and once for the
  govspeak content, every time we add a structured array
  like this we add the content items to an ignore set,
  When we get the secondary matches (which aren't arrays),
  we check them against the ignore set. If they're present,
  we delete them from the ignore set and continue without
  presenting them to the search index.
@richardTowers richardTowers force-pushed the towers/handle-landing-page-content branch from 7faaad2 to f75bcca Compare October 23, 2024 11:18
@richardTowers richardTowers force-pushed the towers/handle-landing-page-content branch from f75bcca to b259b7f Compare October 23, 2024 11:29
@KludgeKML KludgeKML force-pushed the handle-landing-page-content branch from 3f4d8c8 to 4c9c185 Compare October 23, 2024 12:24
@richardTowers richardTowers deleted the towers/handle-landing-page-content branch October 23, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants